将模型参数适应传入数据流是深度学习可伸缩性的关键因素。有趣的是,在线设置中的先前持续学习策略无意中将其更新的参数锚定在本地参数子空间中,以记住旧任务,否则会偏离子空间并忘记。从这个观察结果,我们在构建多个参数模式和每个模式分配任务之间建立了权衡。模式优化的任务分配(MOTA),我们的贡献适应策略,并行训练多个模式,然后优化每个模式的任务分配。我们从经验上证明了基线连续学习策略以及各种分配变化的改进,即子人群,领域和任务转变。
translated by 谷歌翻译
对多代理后门攻击的最新探索表明了反击效应,这是对后门攻击的自然防御效果,后门攻击是随机分类的。这产生了低精度W.R.T.的副作用干净的标签,激发本文在构建多代理后门防御的构建方面的工作,该防御能力最大化精确度W.R.T.清洁标签并将毒药标签的标签降至最低。我们建立在代理动力学和低损失子空间结构的基础上,我们贡献了三种防御能力,可提高多代理后门鲁棒性。
translated by 谷歌翻译
数字危害在移动生态系统中普遍存在。由于这些设备在日常生活中获得了更大的突出,因此太大了,因此增加了对个人的恶意攻击的潜力。最后一系列防御一系列数字伤害 - 包括数字分心,通过仇恨言论的政治极化,以及暴露于损坏材料的儿童 - 是用户界面。这项工作介绍了Greaeeterminator,使研究人员能够开发,部署和测试干预措施与最终用户的危害。我们展示了易于干预开发和部署,以及在五个深入案例研究中,潜在地覆盖了GreeSeterMinator的广泛危害。
translated by 谷歌翻译
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
translated by 谷歌翻译
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
translated by 谷歌翻译
Recent pre-trained language models have shown promising capabilities in generating fluent and realistic natural language text. However, generating multi-sentence text with global content planning has been a long-existing research question. Current approaches for controlled text generation can hardly address this issue, as they usually condition on single known control attributes. In this study, we propose a low-cost yet effective framework which explicitly models the global content plan of the generated text. Specifically, it optimizes the joint distribution of the natural language sequence and the global content plan in a plug-and-play manner. We conduct extensive experiments on the well-established Recipe1M+ benchmark. Both automatic and human evaluations verify that our model achieves the state-of-the-art performance on the task of recipe generation
translated by 谷歌翻译
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.
translated by 谷歌翻译
Cryo Focused Ion-Beam Scanning Electron Microscopy (cryo FIB-SEM) enables three-dimensional and nanoscale imaging of biological specimens via a slice and view mechanism. The FIB-SEM experiments are, however, limited by a slow (typically, several hours) acquisition process and the high electron doses imposed on the beam sensitive specimen can cause damage. In this work, we present a compressive sensing variant of cryo FIB-SEM capable of reducing the operational electron dose and increasing speed. We propose two Targeted Sampling (TS) strategies that leverage the reconstructed image of the previous sample layer as a prior for designing the next subsampling mask. Our image recovery is based on a blind Bayesian dictionary learning approach, i.e., Beta Process Factor Analysis (BPFA). This method is experimentally viable due to our ultra-fast GPU-based implementation of BPFA. Simulations on artificial compressive FIB-SEM measurements validate the success of proposed methods: the operational electron dose can be reduced by up to 20 times. These methods have large implications for the cryo FIB-SEM community, in which the imaging of beam sensitive biological materials without beam damage is crucial.
translated by 谷歌翻译
在原始文本中训练的语言模型(LMS)无法直接访问物理世界。 Gordon和Van Durme(2013)指出,LMS因此可能会遭受报告偏见的困扰:文本很少报告常见事实,而是关注情况的异常方面。如果LMS仅接受文本语料库的培训,并天真地记住当地的同时出现统计数据,那么他们自然会学会对物理世界的偏见。虽然先前的研究反复验证了较小尺度的LM(例如Roberta,GPT-2)放大了报告偏差,但在模型扩展时,这种趋势是否继续。我们从较大语言模型(LLM)(例如Palm和GPT-3)中从颜色的角度研究报告偏见。具体而言,我们查询llms对物体的典型颜色,这是一种简单的感知扎根的物理常识。令人惊讶的是,我们发现LLM在确定对象的典型颜色和更紧密地跟踪人类判断方面的表现明显优于较小的LMS,而不是过于适应文本中存储的表面图案。这表明,仅凭语言的大型语言模型就能克服以局部共发生为特征的某些类型的报告偏差。
translated by 谷歌翻译
神经网络语言模型的最新进展表明,通过利用大规模自然语言数据中的语言关联来得出表达意义表示。这些潜在的格式塔表示已实现许多实际应用的最新性能。看来我们正处于经验得出强大而表达的可计算语义的途径。出现的一个关键问题是,仅语言数据才能使计算机能够理解有关物理世界的必要真相?必须关注这个问题,因为我们与智能机器的未来相互作用取决于我们的技术正确地表示和处理人类通常观察到的概念(对象,属性和过程)。在审查了现有协议之后,这项工作的目的是使用新颖且严格控制的推理测试探索这个问题,并突出显示哪些模型可能直接从纯语言数据中学习。
translated by 谷歌翻译